In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
API_TOKEN="<your HF token>"

In [None]:
def load_model(model_name="mistralai/Mistral-7B-Instruct-v0.2", quantization=True):
    quant_config = BitsAndBytesConfig(
        load_in_4bit=True if quantization else False,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16,
    ) if quantization else None

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        token=API_TOKEN,
        quantization_config=quant_config,
        device_map="cuda:0"
    )
    
    tokenizer = AutoTokenizer.from_pretrained(model_name, token="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")
    return model, tokenizer

# Clear prompts

In [3]:
model,tokenizer = load_model()

Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00,  1.80s/it]


In [4]:
with open("user_feedback.csv", "r") as f:
    feedback = f.read()

In [5]:
feedback

'Name,Email,Feedback,Rating,Date\nEmily Harper,emily.harper91@example.com,Love the interface—very user-friendly! Would love to see dark mode.,4,2025-04-10\nDaniel Cho,dcho.dev@example.net,The app crashes occasionally when I open settings. Please fix.,2,2025-04-09\nPriya Nair,priya.nair88@example.org,Great customer support. Quick and helpful response.,5,2025-04-08\nMarcus Langley,marcus.l23@example.com,"Needs more customization options, but overall solid.",3,2025-04-07\nZoe Chen,zchen_design@example.net,Absolutely love the new update—smooth and fast.,5,2025-04-07\nJamal Thompson,jthompson.dev@example.org,Couldn’t log in after the last update. Had to reinstall.,2,2025-04-06\nIsabella Russo,isar94@example.com,Clean UI and easy to navigate. Keep up the good work!,4,2025-04-06\nKevin Müller,kevin.muller@example.de,"Some minor bugs, but overall very useful for daily tasks.",4,2025-04-05\nAmina El-Sayed,amina.sayed@example.com,Missing a few key features for team collaboration.,3,2025-04-04\nT

In [6]:
messages = [
    {"role": "user", "content": f"Please remove all personally identifiable information from these customer feedback messages: {feedback}"}
]

In [7]:
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [11]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Please remove all personally identifiable information from these customer feedback messages: Name,Email,Feedback,Rating,Date
Emily Harper,emily.harper91@example.com,Love the interface—very user-friendly! Would love to see dark mode.,4,2025-04-10
Daniel Cho,dcho.dev@example.net,The app crashes occasionally when I open settings. Please fix.,2,2025-04-09
Priya Nair,priya.nair88@example.org,Great customer support. Quick and helpful response.,5,2025-04-08
Marcus Langley,marcus.l23@example.com,"Needs more customization options, but overall solid.",3,2025-04-07
Zoe Chen,zchen_design@example.net,Absolutely love the new update—smooth and fast.,5,2025-04-07
Jamal Thompson,jthompson.dev@example.org,Couldn’t log in after the last update. Had to reinstall.,2,2025-04-06
Isabella Russo,isar94@example.com,Clean UI and easy to navigate. Keep up the good work!,4,2025-04-06
Kevin Müller,kevin.muller@example.de,"Some minor bugs, but overall very useful for daily tasks.",4,2025-04-05
Amina El-Sa

In [44]:
messages = [
    {"role": "user", "content": f"""Your task is to anonymize customer feedback for our quarterly review.

Instructions:
1. Replace all customer names with “CUSTOMER_[ID]” (e.g., “Jane Doe” → “CUSTOMER_001”).
2. Replace email addresses with “EMAIL_[ID]@example.com”.
3. Redact phone numbers as “PHONE_[ID]“.
4. If a the feedback mentions a name of a specific person replace their names with "PERSON_[ID]".
5. If no PII is found, copy the message verbatim.
6. Output only the processed messages, separated by ”---”.

Data to process: {{FEEDBACK_DATA}} {feedback}"""}
]
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [45]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Your task is to anonymize customer feedback for our quarterly review.

Instructions:
1. Replace all customer names with “CUSTOMER_[ID]” (e.g., “Jane Doe” → “CUSTOMER_001”).
2. Replace email addresses with “EMAIL_[ID]@example.com”.
3. Redact phone numbers as “PHONE_[ID]“.
4. If a the feedback mentions a name of a specific person replace their names with "PERSON_[ID]".
5. If no PII is found, copy the message verbatim.
6. Output only the processed messages, separated by ”---”.

Data to process: {FEEDBACK_DATA} Name,Email,Feedback,Rating,Date
Emily Harper,emily.harper91@example.com,Love the interface—very user-friendly! Would love to see dark mode.,4,2025-04-10
Daniel Cho,dcho.dev@example.net,The app crashes occasionally when I open settings. Please fix.,2,2025-04-09
Priya Nair,priya.nair88@example.org,Great customer support. Quick and helpful response.,5,2025-04-08
Marcus Langley,marcus.l23@example.com,"Needs more customization options, but overall solid.",3,2025-04-07
Zoe Chen

# Multishot prompting

In [8]:
messages = [
    {"role": "user", "content": f"""Book is to shelf as file is to...?"""}
]
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [9]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Book is to shelf as file is to...? [/INST] File is to computer or storage device. Just as a book belongs on a shelf for easy access and organization, a file is stored on a computer or storage device for the same reasons.</s>


In [10]:
messages = [
    {"role": "user", "content": f"""Hand is to glove as foot is to sock.
     Bird is to nest as bee is to hive.
     Book is to shelf as file is to...?"""}
]
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [11]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Hand is to glove as foot is to sock.
     Bird is to nest as bee is to hive.
     Book is to shelf as file is to...? [/INST] cabinet or folder, as a book is stored on a shelf and files are stored in a cabinet or folder.</s>


# XML tags

In [10]:
messages = [
    {"role": "user", "content": f"""What is the capital of Czech republic?"""}
]
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [11]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] What is the capital of Czech republic? [/INST] The capital city of the Czech Republic is Prague (Praha in Czech). It is the largest city in the country and is known for its rich history, cultural and architectural landmarks, and beautiful old town centers. Prague has been the political, cultural, and economic center of the Czech state for many centuries.</s>


In [8]:
messages = [
    {"role": "user", "content": f"""What is the capital of Czech republic? Put your answer inside <result> tags."""}
]
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [9]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] What is the capital of Czech republic? Put your answer inside <result> tags. [/INST] <result>Prague</result>

While Prague is the largest city in the Czech Republic and its cultural, political and economic center, it's not actually the capital city. The capital city is, in fact, Prague's smaller neighboring city, České Budějovice, but most people commonly refer to Prague as the capital due to its significant role in the country. The administrative offices of the government, however, are indeed located in Prague.</s>


# Prefill responses

In [7]:
messages = [
    {"role": "user", "content": f"""Convert the following data into JSON {feedback}"""},
    {"role": "assistant", "content": """{"""}
]
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt").cuda()

In [8]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Convert the following data into JSON Name,Email,Feedback,Rating,Date
Emily Harper,emily.harper91@example.com,Love the interface—very user-friendly! Would love to see dark mode.,4,2025-04-10
Daniel Cho,dcho.dev@example.net,The app crashes occasionally when I open settings. Please fix.,2,2025-04-09
Priya Nair,priya.nair88@example.org,Great customer support. Quick and helpful response.,5,2025-04-08
Marcus Langley,marcus.l23@example.com,"Needs more customization options, but overall solid.",3,2025-04-07
Zoe Chen,zchen_design@example.net,Absolutely love the new update—smooth and fast.,5,2025-04-07
Jamal Thompson,jthompson.dev@example.org,Couldn’t log in after the last update. Had to reinstall.,2,2025-04-06
Isabella Russo,isar94@example.com,Clean UI and easy to navigate. Keep up the good work!,4,2025-04-06
Kevin Müller,kevin.muller@example.de,"Some minor bugs, but overall very useful for daily tasks.",4,2025-04-05
Amina El-Sayed,amina.sayed@example.com,Missing a few key features f

# Roles

In [3]:
model,tokenizer = load_model("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")

Loading checkpoint shards: 100%|██████████| 2/2 [00:20<00:00, 10.11s/it]


In [None]:
messages = [
    {"role": "system", "content": f"""You are a helpful AI assistant."""},
    {"role": "user", "content": f"""Which law is concerned with paying taxes in the U.S.?"""},
    {"role": "assistant", "content": f"""<think>\n"""}
]
encoded = tokenizer.apply_chat_template(messages, continue_final_message=True, return_tensors="pt").cuda()

In [20]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>You are a helpful AI assistant..<｜User｜>Which law is concerned with paying taxes in the U.S.?<｜Assistant｜><think>
Okay, so I need to figure out which law in the U.S. is concerned with paying taxes. Hmm, I'm not entirely sure, but I'll try to think it through. I know that taxes are a big part of everyone's life in the U.S., so there must be specific laws governing how taxes are collected, enforced, and structured.

First, I remember hearing about the Internal Revenue Code. That sounds familiar. I think that's the main set of laws related to taxes, but I'm not 100% sure. I think the IRS stands for Internal Revenue Service, which is the government agency that handles taxes. So maybe the Internal Revenue Code is the law that deals with paying taxes.

Wait, but I think there are other laws too. For example, there's the Tax Code, which is the same as the Internal Revenue Code, right? So maybe the Internal Revenue Code is the primary one. But I also recall something about

In [17]:
messages = [
    {"role": "system", "content": f"""You are an expert lawyer which helps users with law guidenance."""},
    {"role": "user", "content": f"""Which law is concerned with paying taxes in the U.S.?"""},
    {"role": "assistant", "content": f"""<think>\n"""}
]
encoded = tokenizer.apply_chat_template(messages, continue_final_message=True, return_tensors="pt").cuda()

In [18]:
generated_ids = model.generate(encoded, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<｜begin▁of▁sentence｜>You are an expert lawyer which helps users with law guidenance.<｜User｜>Which law is concerned with paying taxes in the U.S.?<｜Assistant｜><think>
Okay, so I'm trying to figure out which law is concerned with paying taxes in the U.S. I remember that taxes are a big part of our legal system, but I'm not exactly sure which specific law governs the payment of taxes. Let me think through this step by step.

First, I know that the U.S. has a system of federal, state, and local taxes. Each of these levels has its own set of laws. For federal taxes, I think there's a specific tax code. I've heard of the Internal Revenue Code, which is part of the U.S. Tax Code. So maybe that's one part of it.

Then there are state taxes. Each state probably has its own tax laws. For example, California has a state income tax, and New York has its own. So the tax laws vary by state, but I'm not sure if there's a single overarching state tax code like the federal one.

Local taxes might be si